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1. Introduction. Although [/-statistics (Halmos, 1946; Hoeffding, 1948) are rela- 
tively simple probabilistic objects, namely averages over an i.i.d. sample Xi, . . . , X n 
of measurable functions (kernels) h(xi, . . . , x m ) of several variables, their asymptotic 
theory is only recently attaining a satisfactory degree of completeness: see e.g. Ru- 
bin and Vitale (1980), Gine and Zinn (1994), Zhang (1999) and Latala and Zinn 
(1999) on necessary and sufficient conditions for the central limit theorem and the 
law of large numbers. We are interested here in the law of the iterated logarithm 
for [/-statistics based on canonical (or completely degenerate) kernels, that is, on 
kernels whose conditional expectation given any m — 1 variables is zero, and only 
for m = 2. 

[/-statistics with nondegenerate kernels behave, as is well known, like sums 
of independent random variables, and the LIL in this case was proved by Serfling 
(1971). The LIL for canonical (or completely degenerate) kernels h with finite 
absolute moment of order 2 + 5, 5 > 0, was obtained by Dehling, Denker and 
Philipp (1984, 1986), and with finite second moment by Dehling (1989) and Ar- 
cones and Gine (1995). Gine and Zhang (1996) showed that there exist degenerate 
kernels h with infinite second moment such that, nevertheless, the corresponding 
[/-statistics satisfy the law of the iterated logarithm, and obtained a necessary in- 
tegrability condition as well. This last article and Goodman's (1996) also contain 
LIL's under assumptions that do not imply finiteness of the second moment of h, 
but that fall quite short from being necessary. The LIL for finite sums of products 
Y2 i=1 \i4>i(xi) ■ • • (f>i(x m ) is easier (Eh 2 < oo is necessary) and was considered by 
Teicher (1995) for k = 1 and by Gine and Zhang (1996) for any k < oo. In the 
present article the bounded LIL problem is solved for kernels of order 2. Next we 
describe our result and comment on its (relatively involved) proof. 

In what follows, X,Xi, ieN, are independent identically distributed random 
variables taking values on some measurable space (S, S), and h : S 2 i— > R is a 
measurable function that we assume, without loss of generality (for our purposes), 
symmetric in its entries, that is, h(x, y) = h(y, x) for all x, y G S. When h is 
integrable we say that it is canonical, or degenerate, for the law of X if Eh{X, y) = 
for almost all y G S (relative to the law of X). The natural LIL normalization for 
[/-statistics corresponding to degenerate kernels of order 2 is n log log n as is seen 
with the following example. A simple canonical kernel for S = R and X integrable 
with EX = is h(x, y) = xy. For this example, if moreover EX 2 < oo then, by 
the LIL and the law of large numbers for sums of independent random variables, we 
have 



lim sup 



2n log log n 



EXiXj = lim sup X 

3 n ^[v^loglogTl^ 



VarX 



i^j<n 

Our main result is as follows: 

Theorem 1.1. Let X,Y,X i} i G N, be i.i.d. random variables taking values in 
(S, S) and let h : S 2 i— > R be a measurable function of two variables. Then, 



lim sup 



n log log n 



l<iy^j<n 



< oo a.s. (1-1) 
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if an only if the following three conditions hold: 
a) h is canonical for the law of X 
and there exists C < oo such that 



b) for all u > 10, 

E(h 2 (X, Y) A it) < Clog log u, 



(1.2) 



and 




(1.3) 



/i 2 



(1.4) 



(loglog(|/i| V e e ) 1+5 



< oo 



for all 5 > (and is implied by i?/i 2 / loglog(|/i| Ve e ) < oo. In particular condition b) 
ensures the existence of the integrals in conditions a) and c). Condition c) implies 
that the operator defined on L 00 (C(X)) by Hf(y) = Eh{X,y)f{X) takes values 
in L 2 (C(X)) and extends as a bounded operator to all of L 2 {C{X)). Moreover, if 
with a slight abuse of notation we set Exh{X, Y)f(X) := Hf(Y) for / e L 2 , then 
condition b) is equivalent to 



(Here and in what follows, Ex (resp. Ey) indicates expectation with respect to X 
(resp. Y) only.) 

The integrability condition b) was proved to be necessary for the LIL (1.1) by 
Gine and Zhang (1996), whereas the idea for condition c) comes from Dehling (1989) 
who showed that if h(x,y) is canonical and square integrable then 



We will not prove Theorem 1.1 directly, but instead we will prove first that 
conditions b) and c) are necessary and sufficient for a decoupled and randomized 
version of the LIL, namely, for 



where {si} is a Rademacher sequence independent of all the other variables. (We 
recall that a Rademacher sequence is a sequence of independent random variables 
taking on only the values 1 and —1, each with probability 1/2.) The reasons for 
this are multiple. One is that necessity of condition c) follows as a consequence of 
a recent result of Latala (1999) on estimation of tail probabilities of Rademacher 
chaos variables. Another reason is that, because of the Rademacher multipliers, 



E Y (E x h(X,Y)f(X)) 2 < C 2 Ef{X) for all / G L 2 . 



(1.5) 




{Eh(X, Y)f(X)f(Y) : Ef(X) < l} a.s. 




(1.6) 
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truncation of the kernel will result in symmetric, and hence mean zero, variables; 
this is important since the proof of sufficiency contains several relatively complicated 
truncations of h. Moreover, part of the core of the proof of sufficiency consists of 
an iterative application of an exponential bound for sums of independent random 
variables and vectors, and having decoupled expressions makes this iteration possi- 
ble (although we could use, alternatively, an exponential inequality for martingale 
differences that does not require decoupled expressions). 

The exponential inequality in question is Talagrand's (1996) uniform Prohorov 
inequality. This inequality depends on two parameters, the bound of the vari- 
ables and the weak variance of their sum, and to apply it iteratively requires not 
only that h be truncated at a low level, but that the conditional second moments 
of these truncations of h be small as well. This explains the relatively complicated 
multi-step truncation procedure in the proof of sufficiency. 

Finally, the limit (1.6) will imply the limit (1.1) by a two stage symmetrization 
argument that will also require control of the conditional expectations of the sums; 
this control will be achieved once more, again after multiple truncations, by means 
of Talagrand's exponential inequality. 

Section 2 contains several known results needed in the sequel. Section 3 is 
devoted to the proof of the LIL for decoupled, randomized kernels, and Section 4 
reduces the LIL for canonical kernels to this case. In Section 5 we complete the 
proof of Theorem 1.1 and make several comments about the limsup in (1.1) and the 
limit set of the LIL sequence. 

We adhere in what follows to the following notation (some of it already set up 
above) : 

o h is a measurable real function of two variables defined on (5" 2 , <S(g)<S), symmetric 
in its entries. 

o X,Xi,X2, ■ ■ ■ and Y,Yi,Y2,... denote two independent, equidistributed se- 
quences of i.i.d. S- valued random variables. 

o We write Ef(h) for Ef(h(X,Y)), and E x , Pr x (resp. E Y , Pry) denote ex- 
pected value and probability with respect to the random variables X, Xi (resp. 
y,li)only. 

o £i,£2,. • • , and ii, £2, ■ ■ ■ are two independent Rademacher sequences, indepen- 
dent of all other random variables. 

o We write L 2 x and L s x instead of L(L(x)) and L(L(L(x))), where L(x) = 
max(logx, 1). 

o In all proofs C denotes a universal constant which may change from line to line 
but does not depend on any parameters. 

2. Preliminary results. For convenience, we isolate in this section several known 
results needed below. 

(A) Hoeff ding's decomposition. The tZ-statistics with kernel h (not necessarily sym- 
metric in its entries) based on {Xi} are defined as 




By considering instead the kernel h(x,y) = (h(x,y) + h(y,x))/2, we have 
U n (h) = U n (h) 



^ ' l<i<j<n 



n(n — 1) 



So, we will assume h symmetric in its entries in all that follows. 
Suppose E\h(X,Y)\ < oo. Then, 

h(x, y) - Eh(X, Y) = [h(x, y) - E Y h(x, Y) - E x h(X, y) + Eh(X, Y)] 

+ [E Y h(x, Y) - Eh(X, Y)] + [E x h(X, y) - Eh(X, Y)] 
:= n 2 h{x, y) + inh(x) + irih(y), (2.1) 

where the identities hold a.s. for C(X) x C(X). The kernel 7c 2 h is canonical (or 
degenerate) for the law of X as Exir 2 h(X, Y) = EyT\ 2 h{X, Y) = a.s., and ir\h{X) 
is centered. This decomposition of h gives rise to Hoeffding 's decomposition of the 
corresponding [/-statistics, 

h(X h X j )= ^ 2 h(X l7 X J ) + (n-l)J2^iHXr) + ( n ^)Eh(X,Y), 

l<i<j<n l<i<j<n i=l ^ ' 

(2.2) 

and of their decoupled versions, 



]T h(Xi,Yj)= ^2h(X l ,Y J )+nJ2^MX l ) 



i=l 



+ n^n 1 h(Y l ) + n 2 Eh(X,Y). 



(2.3) 



(B) The equivalence of several LIL statements. The following lemma contains nec- 
essary randomization and integrability conditions for the LIL: 

Lemma 2.1. (Gine and Zhang, 1996). (a) (Integrability. ) There exists a universal 
constant K such that, if 



f>{^d E e.sMX^yc) 

n=l ^ l<i,j'<2™ J 



< OO 



(2.4) 



for some C < oo, then 



E(h 2 (X, Y) A it) , 
limsup K T < KC 2 . 

u^oo L 2 U 



(b) (Randomization and decoupling, partial. ) The LIL 

1 



lim sup 



nL 2 n 



E h ( x ^ x j) 

l<i<j<n 



< C a.s. 



(2.5) 



(2.6) 
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for some C < oo implies 



00 f 1 I 

EPr< — — max > EisMXi, Y~) 
1 2 n Ln fc<2« I ^ 1 3 v *' 3) 

n=l v Ki,j<k 



> 2 7 C [ < 00. 



In particular, the LIL implies both the integrability condition (2.5) and the ran- 
domized and decoupled LIL, that is, 



lim sup — - — 

n nL 2 n 



< D a.s. 



(2.7) 



with D = KC for some universal constant K. 



Part (a) is contained in the proof of Theorem 3.1 in Gine and Zhang (1996), 
while part (b) is the content of Theorem 3.1 and Lemma 3.3 there. 

We recall that the limsups at the left hand sides of (2.6) and (2.7) are always 
a.s. constant (finite or infinite) by the Hewitt-Savage zero-one law. 

Decoupling gives the following equivalence between the LIL and its decoupled 
version. 



Lemma 2.2. (a) The LIL (2.6) is equivalent to the decoupled LIL, that is, to 

1 



lim sup 

n nL 2 n 



< D a.s. 



(2.8) 



for some D < 00, meaning that if (2.6) holds for C then (2.8) holds for D = KC 
and that if (2.8) holds for D then (2.6) holds for C = KD, where K is a universal 
constant. 

(b) The decoupled and randomized LIL (2.7) is equivalent to the randomized LIL 



lim sup 



n nL 2 n 



E £i£jh(Xi,Xj) 

l<i^j<n 



< C a.s. 



(2.9) 



for some C finite (with C and D related as in part (a)). 

(c) The LIL (2.7) implies convergence of the series (2.4) for some C = KD < 00, K 
a universal constant, hence it also implies the integrability condition (2.5) (with C 
replaced by D). 



Proof, (a) We can equivalently write (2.6) as 

1 



lim Pr< sup 



k— >oo 



n>k 



nL 2 n 



> c 



for some C < 00, hence as 



lim Prill E In.j.iAX^X,) > c) 

^ KWKoo J 



where 



hi,k ■ = 



l<i^j <oo 

h h 



0, 



h 



kL 2 k' {k + l)L 2 {k + I)' " ' ' nL 2 n' 
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if i < k and 

h h h 

i—k 



h hk := 0,^,0, 



iL^V (i + l)L 2 (i + 1) ' ' ' nl 2 n' 



if z > /c are ^-valued functions and || • || denotes the sup of the coordinates. Then, 
the decoupling inequalities of de la Pena and Montgomery- Smith (1994) apply to 
show that the above tail probabilities are equivalent up to constants to those of the 
corresponding decoupled expressions, thus giving the equivalence between (2.6) and 
(2.8). 

(b) If (2.9) holds, then (2.7) without diagonal terms (that is, without the sum- 
mands corresponding to i = j) holds too by the first part of the proof applied to the 
kernel aj3h{x, y). Moreover, (2.9) implies the integrability condition (2.5) by Lemma 
2.1 (note that if {e- J ' ) }, j = 1,2,3, are three independent Rademacher sequences, 

then {e^ef^} and {s^ef^} are also independent Rademacher sequences) and, as 
a consequence, h is integrable. Hence, by the law of large numbers, the diagonal 
in (2.7) is irrelevant, showing that (2.7) holds with the diagonal included. If (2.7) 
holds, then we also have E\h\ < oo: a modification of the proof of the converse 
central limit theorem in Gine and Zinn (1994), consisting in replacing use of the law 
of large numbers by use of inequality (3.7) in Gine and Zhang (1996), shows that 
if the sequence {(nL2^) _1 Yli j< n £i£jh(Xi, lj)}} is stochastically bounded, then 
Eh 2 (X,Y) A u < C(L2u) 2 for some C < oo, in particular, that E\h\ < oo. So, we 
can delete the diagonal in (2.7), and then apply the first part of the lemma to undo 
the decoupling. 

(c) Statement (c) follows from (b) because, by Lemma 2.1, (2.9) implies conve- 
gence of the series (2.4) for some C < oo. □ 

The following lemma, together with the previous ones, will allow blocking and 
will reduce the proof of sufficiency of the LIL to showing that a series of tail prob- 
abilities converges (just as with sums of i.i.d random variables). 

Lemma 2.3. There exists a universal constant C < oo such that for any kernel h 
and any two sequences X i} Yj of i.i.d. random variables we have 

Prj max I V h(X t ,Yj) >tl<CPr{| V h{X h Yj) >t/c\ (2.10) 

I k<m,l<n I ^ — ' ' I II ^ — ' I 

i<k,j<l J i<m,j<n 

for all m, n G N and for all t > 0. 

Proof. Montgomery-Smith's (1993) maximal inequality for i.i.d. sums asserts that 
if Zi are i.i.d. r.v.'s with values in some Banach space B then for some universal 
constant C\ and all t > we have 

Prj maxjl^^H > 1 1 < d Pr j ||$^^|| > [■ 
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We apply this inequality to B = and Zi = (J2j<iH^i^Vj) ' I < n) for fixed 
values of y±, . . . , y n to get 

Pr{ max I V h{X h Yj) > t \ < d Primaxl V h(X^Yj) >t/Ci\. 

k<m,l<n I I Kn I j 

^ - - ' - i<m,j<l 



i<k,j<l 

In a similar way we may prove 



Pr|max| E h(Xi,Yj) >*/Ci|<CiPr| E \h{X h Yj) >t/c{^ 



i<m,j<l 

Thus the assertion holds with C = C\. □ 
Corollary 2.4. If 



i<m,j<n 



X>{^4| E w,)|>c} 

n=l k l<j,J<2™ ' 



< OO a.s. 



(2.11) 



for some C < oo, then there is a universal constant K such that 



nL2n 

Proof. Since, for any < D < oo 



limsup — - — I h(Xi,Y~) 



l<i,j<n 



<KC a.s. 



(2.12) 



Pr ( sup _^| v /i^,^) >£>) 



< Pr< sup 



max 



l<i,j<n 



< 



fc>[logJV/log2] 2*=-"<n<2*= 2 fc Lfc 
g AT/ log 21 k _ _ Ki,j<n 



E M*^)|>£>} 



> 



D2 k Lk 



fc>[log AT/ log 2] 

the result follows from Lemma 2.3. □ 

Applying Corollary 2.4 to the kernel aj3h{x, y) we obtain the converse of Lemma 
2.2(c). Hence, 

Corollary 2.5. Consider the statements 

1 



lim sup 



n nL 2 n 



E £i£jK x ^ Y j) 

l<i, j<n 



<C a.s. 



and 



f>{^4d E *W^)|>4 

n=l ^ l<i,j<2™ ' 



< (X). 
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There is a universal constant K such that if the first statement holds for some 
C < oo then the second holds for D = KC , and conversely, if the second holds for 
some D < oo then so does the first, for C = KD. 

We will also require the following partial converse to Lemma 2.1(b) regarding 
the regular LIL and convergence of series of tail probabilities: 

Corollary 2.6. Suppose E | h | < oo. If 



n=l K l<i,7<2™ ' 



< oo a.s. 



for some C < oo then the LIL holds, that is, there is a universal constant K such 
that 

l<i<j'<n 



lim sup 

n nL 2 n 



Proof. Convergence of the series implies (2.12), that is, the decoupled LIL with 
diagonal terms included. Since E\h\ < oo, the diagonal terms are irrelevant and 
therefore the decoupled LIL (2.8) holds. The result now follows from Lemma 2.2(a). 
□ 

In Section 4 we will apply the conclusion of Corollary 2.6 under the assumption 
that the decoupled and randomized LIL (2.7) holds: this is possible because (2.7) 
implies integrability of h, as indicated in the proof of Lemma 2.2(b). 

(C) Inequalities. As mentioned in the Introduction, the following two inequalities 
will play a basic role in the proof of Theorem 1.1. The first consists of a sharp 
estimate of the tail probabilities of Rademacher chaos variables (it is in fact part of 
a sharper two sided estimate) . 

Lemma 2.7. (Latala, 1999). There exists a universal constant c > such that, for 
all matrices (a^j) and for all t > 0, 



Pr 



E 



i,3 



>c| 



a 



K7/ 



> cAe"' 



(2.13) 



where is defined as 



sup 



{E« 



>,,jbiCj 



bi\, \cj\ < 1 for all 



(2.14) 



The second is a uniform Prohorov inequality due to Talagrand. It combines 
Theorem 1.4 in Talagrand (1996) with Corollary 3.4 in Talagrand (1994). 

Lemma 2.8. (Talagrand, 1996). Let {Xi}, i = 1, . . . , n for any n e N, be indepen- 
dent random variables with values in a measurable space (S, S), let J 7 be a countable 
class of measurable functions on S and let 



n 



Z:= supWpQ. 
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There exists a universal constant K such that for all t > and n G N, if 

n 

max sup ess sup w6n | f (X,(^))| < t/, ^fsup V / 2 (X,)) < V 

l<l<n fszjr ■ \f e jr^ 



i = l 



and 



then 



sup VE/ 2 ^) <a 2 , 



Pr{|Z - £Z| > f } < ^exp(--^ log (l + f )) 



<Kexp(-A_log(l + -^^)). (2.15) 

In fact, we will only use the corresponding deviation inequality, that is, the 
bound (2.5) for Pr{Z > EZ + t}. Ledoux (1987) contains a simple proof of this 
result based on logarithmic Sobolev inequalities. 

When T consists of a single function / and the variables f(Xi) are centered this 
inequality reduces, modulo constants, to the classical Prohorov inequality. For con- 
venience, we will refer below to Lemma 2.8 even in cases when Prohorov' s inequality 
suffices. 

3. Symmetrized kernels. In this section we prove the following theorem, which 
constitutes the basic component of the proof of Theorem 1.1. 

Theorem 3.1. The decoupled and randomized LIL holds, that is, 

limsup — : — : 1 EiEjh(Xi,Yj) < oo a.s. (3.1) 

nloglognl ■ ^— ' 

if and only if the following two conditions are satisfied for some C < oo: 

Emm(h 2 , u) < CL 2 u for all u > 0, (3.2) 

and 

sup{Eh(X,Y)f(X)g(Y) : Ef 2 (X) < l,Eg 2 (Y) < 1, 

II /II oo < oo, Halloo < oo} < C < OO. (3.3) 

Remark. We recall that, by Corollary 2.5, a necessary and sufficient condition for 
the LIL (3.1) to hold is that 

n=l ^ 1<«,J<2™ ' 

for some C < oo. 

Proof of necessity. The integrability condition (3.2) is necessary for (3.1) by 
Lemma 2.2(c). The necessity of (3.3) will follow from Lemma 2.7. For this, we 



11 



estimate first 1 1 1 (h(X i: Yj) : i,j < 2 n )||| logn , where ||| • \\\ t is as defined in (2.13). 
Suppose that /, g E are such that Ef 2 (X) = Eg 2 (X) = 1 and set 

K:=\Eh(X,Y)f(X)g(Y)\, (3.5) 

that we can assume strictly positive. Note that the integral exists by (3.2). Then 
by the SLLN for i.i.d. r.v.'s and ^-statistics we have a.s. 

n- 1 £/ 2 M - Ef = 1, n" 1 £> 2 (IS-) - Eg 2 = 1 

j<n 



i<n 



and 



n~ 2 \ h ( x i, Y j)f( x i)9( Y j) ^\Eh(X,Y)f(X)g(Y)\. 

i,j<n 

So, for large enough n, 

Pr{2"» £ AX,) < 2 | > |, Pr{ 2 -" £ y 2 (y,) < 2} > I 



and 



Pr|2- 2 "| £ M^,WW^)|>^/2}> 



with if as in (3.5). Since f,g<E we have that, for large enough n, 

< 1 a.s. 



logn 



Then, it follows directly from the definition of 1 1 1 • 1 1 \ t that, on the intersection of the 
above five events, we have the bound 



|||(M^,^):^J<2 n )||| logn >^- 2 logn. 
Therefore, for large n, 

Pr{ 1 1 1 (h(Xi, Yj) :i,j< 2-) | | ^ > KT~ 2 logn} > \. 
Then, Lemma 2.7 implies that, for all n large enough, 

Pril h{X l ,Y J )e l e j > cK2 n ~ 2 log n}> l e - logn = 

II I / i 

K i,j<2 n J 

By (3.4), this implies that if the LIL holds then K is uniformly bounded, proving 
necessity of condition (3.3). □ 

Before starting the proof of sufficiency, it is convenient to show how the in- 
tegrability condition (3.2) limits the sizes of certain truncated conditional second 
moments. To simplify notation, we define 

f n (x) = E Y mm(h 2 (x,Y),2 4n ) and f n (y) = E x mm(h 2 (X, y), 2 4n ). (3.6) 
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Lemma 3.2. For any kernel h satisfying condition (3.2) we have that, for all a > 0, 
^2 n Pr x {^ymin(/i 2 (X,y),2 an ) > 2 n (logn) 2 } < oo. (3.7) 

n 

Moreover, 

on , .. 

V- -Pr{/ n (X)>2-(logn) 2 -H<oo for all k > 0. (3.8) 

(logror L J 

Proof. For a fixed, we set 7 fc = exp(2 fc+1 ) and f k (X) = E Y mm(h 2 ,2 a ^ k ). Then, 

2 n Pr x {E Y mm(h 2 (X,Y),2 an ) > 2 n (logn) 2 } 

2 fe <logn<2 fe + 1 

< 2 n Pr x {f k (X)>2 n+2k } 

2 fe <logn<2 fc + 1 

< E s ^2 n l{f k (X) > 2 n+2h ) 



n 

< 2 1 - 2k Ef k (X) < 2 1 ~ 2k CL 2 {2 aik ) 

< 2 1 - 2fc C(loga + 2 fc+1 ). (3.9) 

Convergence in (3.7) follows from (3.9). Condition (3.8) is an easy consequence 
of (3.7) (as can be seen e.g. by making the approximate change of variables 
2 n /(logn) fc ~ 2 m in (3.8) and comparing with (3.7) for a > 4). □ 

Proof of sufficiency. Since this is only a matter of normalization we will assume 
that conditions (3.2) and (3.3) are satisfied with C = 1. By the Remark below 
Theorem 3.1, proving the LIL is equivalent to showing that the series (3.4) converges 
for some C < oo. To establish this we will show in several steps that we may suitably 
truncate h by proving inequalities of the form 

J2 Pr {\ ^3hn{Xi,Yj)\ > C2 n logn| < oo, (3.10) 

n ^ i,j<2 n ' 

where h n := hlA n and A n are suitably chosen subsets of the product space. Then, 
we will apply Lemma 2.8 conditionally to the truncated h (several times, and after 
some additional preparation). 

Step 1. Inequality (3.10) holds for any C > if 

A n C {(x,y) :max(/ n (x),/ n (y)) > 2 n (logn) 2 }. 
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In this case, by (3.8), 
E Pr {| E e i e j h n (X i ,Y j )\>C2 n \ogn\ 

n ^ i,j<2 n ' 

< $> r { 3 1 ^ T : ^{Xi) > 2-(logn) 2 | 

+ £Pr{3j<2»: f n {Y 3 ) > 2 n (log nf\ 

< 2^2-Pr{/ n (X) > 2-(logn) 2 } < oo. 

n 

Step 2. Inequality (3.10) holds for any C > if 

A n c{(x,y):/i 2 (x,y)>2 2 ™(logn) 2 }. 
Indeed, by Chebyshev's inequality, 

J2 Pt {\ E ^^n(^,^)| >C2 n logn| 

n ° i,J<2™ 

= 2^ C1 - g l^l / {|^l>2"logn} 

n 

= C- 1 E\h\J2 I —I(\h\>2 n logn) 

n 

STEP 3. Inequality (3.10) holds for any C > if 

A n C {(x,y) : 2 2 ™n" 4 < h 2 (x 7 y) < 2 2n (log n) 2 J n (x)J n (y) < 2™(logn) 2 }. 
If we use again Chebyshev's inequality, it suffices to prove that 

E-^l Si, 7 -<2™ £ i£jhn(Xi, Yj) | /oii\ 
„ t; < oo. (3.11) 
2 4n (logn) 4 V ' 
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Notice however that, by iteration of Khinchin's inequality (or by direct computa- 
tion), there is C < oo (e.g. C = 18) such that 

C~ 1 e\ eiEjKiXi.Y^ <e\ h l( x i> Y i)\ 2 

i,j<2 n i,j<2 n 

<Y J Eh 4 n (X l ,Y J )+ EhKXi^hKX^Yj) 

+ EhKx^hKx^,) 

+ J2 EhKx^hKx^Yy). 

i^i',3^3' 

So, to prove (3.11) we have to check convergence of these four series. 
First series: 

E 2^(1^)4 " ^ 2 ^{lgnY EhAl{h ^ n ^ n ^ 



E h 4 Y ^ rrl{h 2 < 2 2n (logn) 2 ) 



2 2n (logn) 

£ 6Ehi mkw < °°- 

Second series: (below we use the notation h n := h n (X, Y), h n = h n (X, Y) and 
X is an independent copy of X) 

y 2^Eh 2 n (X,Y)h 2 n (X,Y) _ ys Ehjhl <2 ^ Ehlhll{\h\<\h\) 
^ 2 4n W 4 n ^2 n (logn) 4_ ^ 2 n (logn) 4 

n to n vo/ n v o / 

< 2Eh 2 h 2 I(\h\ < -i I{E x ^n(h 2 X n ) < 2™(logn) 2 ,/? < 2 4 ™) 

2 n (lognJ 4 

< 2Eh 2 h 2 I(\h\ <\h\)J2 onn 1 u T ( E x min (^ ~ h2 ) ^ ^(^gn) 2 , \h\ < 2 2n ) 

2 n (lognJ 4 

1 _ h 2 

< CEh 2 h 2 I(\h\ < \h\) = = — < CE = — < oo. 

3rd series: convergence follows just as for the second. 
4th series: here we have by (3.2) 

2 4n (Eh 2 n ) 2 Eh 2 n 



2 4n (logn) 4 ~~ ^ (logn) ; 



= dEh2 E 7T^3 A2 2n n" 4 < h\x,y) < 2 2 ™(logn) 2 ) 
z — ' ( log n ) 

^ h 2 
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where we use the fact that 

Cardjn : 2 2n n" 4 < h 2 (x,y) < 2 2n log 2 n} ~ 2L 2 h. 

This completes the third Step. 

Step 4. Inequality (3.10) holds for any C > if 

)g', 



, 22n 2 n ~\ 
A n C I (x, y) : h 2 (x, y) < — < max(/ n (x), / n (y)) < 2™(logn) 2 L 



We follow the proof of the previous step. The only difference is in the proof of 
convergence of the fourth series. We have for n > 2 



Eh n < 2^£ , min(/i ,2 n )/{2»(iogn) 2 - fe </„(x)<2™(io g n)3- fe } 
k=i 

3 

< ^2" +1 (logn) 3 - fc Pr{/ n (X) > 2™(logn) 2 - fc }. 



k=i 

Thus, by (3.8), 

E 7T% < E E Pr (/n(X) > 2^(logn) 2 - fc } < oo. 

^(logn) 3 ^V( logn ) 

For the next step, we define the functions 

g n (x) = E Y hI { \ h \> 2nn 2 } . (3.12) 

Step 5. Inequality (3.1) holds for any C > if 

A n C {(x,y) : max(g n (x),g n (y)) > l}. 

Assumption (3.2) implies that Pr{|/i| > v} < v~ 2 L 2 v 2 . Hence, E\h\I{\h\> s y < 
Cs~ 1 L 2 s for s > 1. Therefore, 

5>Pr{K(X)|>l}<C£^<oo, 

n n 

and the same is true for g n (Y). 

Step 6. Inequality (3.10) holds for any C > if 

r 2 n 2 n 2 2n i 

C \(x,y) : / n (x) > — , f n (y) > — , /i 2 (x,y) < — \. 

To see this we note first that 

n2n o2n , nn .. , on .. 

Ehl < < ^Pr{/„(*) > -}Pr{/n(y) > -} 

^2 2 " ( nEf n (X) \ 2 ^-(logn) 2 



~ n 4 V 2 n 7 - n 2 
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since Ef n (X) = Emm(h 2 , 2 4n ) < Clogn by (3.2). Now we may conclude Step 6 
by Chebyshev's inequality as 

Si,j<2™ £ i£jhn(Xi, Yj) | ^ Eh^ ^ < oo 

2 2n (logn) 2 ~~ (logn) 2 ~ ^ n 2 °°" 

Step 7. Inequality (3.10) holds for some C > if 

r 2 n 2 n 2 2n 

4„ = i (a?,y) : fn(x) < ,/ n (y) < — ,0n(aO < l,0n(y) < l,/l 2 (x,y) < — }. 

L logn n 

This is the most involved step, and the only one (except for the similar Step 
8 below) where we use condition (3.3). To prove (3.10) in this case, we will use 
Prohorov's inequality (or Lemma 2.8) together with the following four lemmas (one 
of which also uses Talagrands's inequality). 

Lemma 3.3. For all n e N, 



Prj| Y,^hn(X t ,Y) 

V ,VOn 



> 2 n+4 > < 2 _4n 



i<2 T 

and 

> 2'" 1 !> < x 



EPr<^ max \y~] £ihn(Xi t Yj) 
Kj<2 n \ 
n y i<2™ 



Proof. We note that A n C {(x,y) : < n~ 1 2 n J n (y) < n _1 2 n } and then 

apply Bernstein's inequality or Prohorov's inequality to obtain that, for any Y, 



^ i<2™ ' 



< e" 4 ", 



which clearly implies the Lemma. (Lemma 2.8 instead of Bernstein's or Prohorov's 
inequality would simply change multiplicative constants.) □ 

Before formulating the next lemma it is convenient to define a sequence c n by 
the formula 

C n = Eh 2 I{ 2 ™n- 2 <\h\<2™n 2 }, n G N. (3.13) 

Lemma 3.4. We have 

2 logn 



J^exp 



< oo. 



Proof. Condition (3.2) implies that, for any k>2, 

Cn < CkE\h\ 2 I^ h ^ <2e k+l ^ e k+iyy < (7/c 2 , 

fc<log n<fc+l 

(where the second constant is different from the first) since the largest number of 
intervals I n = [n~ 2 2 n , n 2 2 n ], k < logn < k + 1, that can overlap with any given one 
of them is not larger than 6(k + 1). Hence, 

Card{n : k < logn < k + 1, c n > 1} < C/c 2 . 
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Condition (3.2) also implies c n < 21ogn (note that c\ = 0). So, 



J^exp 

n 



2 logn 



= J < ^exp(-v / 21ogn) + ex P(- 

' n c„>l ^ 



2 logn 



VI + 21ogn 



< ^exp(-v / 21ogn) + ^ C/c 2 exp(-v / A;) < oo. 



□ 



The following lemma is well known but a proof is provided for the reader's 
convenience. 

Lemma 3.5. If a kernel k satisfies E x \k(X,y)\ < 1 and E Y \k(x,Y)\ < 1 a.s., then 
k defines an operator on L 2 (£(X)) with norm bounded by 1, that is, condition (3.3) 
holds for h = k and C = 1 (and therefore so does condition (1.5)). 

Proof. We need to check that 

\E x E Y k{X,Y)f{X)g{Y)\ < [Ef 2 (X)Eg 2 (Y)} 1 ' 2 



whenever ||/||oo, IMloo < oo. But, assuming (without loss of generality) that k, f 
and g are nonnegative, 

E x E Y k(X, Y)f(X)g(Y) = E x [f{X)E Y (k 1 ' 2 {X, Y)k 1 / 2 (X, Y)g{Y)) 

< E X [f(X)(E Y k(X, Y)f/ 2 {E Y k{X, Y)g 2 {Y)f/ 2 

< E x \f(X)(E Y k(X,Y)g 2 (Y)) 1/2 } 



< {E x f 2 (X)) 1/2 E x {E Y k(X,Y)g 2 (Y)) 



1/2 



and now the inequality follows by applying Fubini and using E x k(X, Y) < 1. □ 
Lemma 3.6. There exists C\ < oo such that 

J2 Pr { E y{Yl £ihn(Xi,Y)Y > C.VTT^ 2 n \ogn\ < oo. 

n ^ i<2 n ' 

Proof. Let H Y be Z/ 2 (^, cr(Y), Pr), that is, H Y is the space of all square inte- 
gable random variables f(Y) where / is a Borel measurable function. Let Xj := 
£ih n (Xi, Y) for i = 1, . . . , 2 n . Then, Xj are symmetric i.i.d. random vectors with 
values in H Y . We define 



2" |- 

= supJ]/(X i ) = E Y (Y,eMXi,Y)y 



1/2 



where T is a countable dense subset of the unit ball of H' Y = H Y and we write 
/(') := (/> ')• We will apply Lemma 2.8 to Z. For this, we must estimate EZ and 
determine suitable U and a 2 . We have 



EZ < (EZ 2 ) 1/2 = [TEhlX 1 '" < V / 2Moi^ 



21 1/2 



(3.14) 
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by (3.2). Since 



we can take 



/ 2 n 

sup |/(Xi(o;))| = ||X,H||y = y/E Y hl(Xi(u),Y) < J- , 

feF V iogn 



On 

U= x /- (3.15) 

w logn 



in Lemma 2.8 for Z. Moreover, for each / e J 7 , 

3 



£/ 2 (X.) = E(E Y h n (X i ,Y)f(Y)Y <zY,E{E Y h^{X h Y)f{Y))\ 



, 2 

< 3 

where 



with 

r 2 n 2 n "l 

B n := \ (x, y) : / n (^) < ; , fn(y) < — , £n(z) < 1, 9n(y) <l\, 



since 



u _ 1,(1) _ 1,(2) _ l(3) 
u n — n n n n n n . 



Now, 

E{E Y h^(X i ,Y)f(Y)) 2 <1 
by condition (1.5) (which is equivalent to (1.3)=(3.3)), 

i?(^/ii 2 )(x,,y)/(y)) 2 < i?(/,( 2 )) 2 < Cn 

by Cauchy-Schwartz and the definition of c n in (3.13), and 

E{E Y h ( i\X^Y)f{Y)f <1 

by Lemma 3.5 (see (1.5) once more). Therefore, we can take a 2 in Lemma 2.8 for 
Z to be 

(7 2 = 3-2 n (2 + c n ) <6-2 n (l + c n ). (3.16) 
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Then, on account of (3.14)-(3.16), Lemma 2.8 gives, with C 2 = (\/Ci - l) 2 , 
Pr j E Y \ SihniX^Y)^ > dVTTc^ 2"logn| 

= Pr > y / Civ / TT^2"logn| 

< Pr jz - EZ > \]c 2 ^TTc^ 2 n log n j 

< if exp^-^p^v 7 ! + c n logn log(l + 



< if exp ( 



6(1 + +8 

Vc 2 



V * ^ /rT ^ lo g- lo g( 1 + U( 1 + c 



< A - exp ^___ log ( 1 + _)__j, 

where in the last line we have used that the function x~ l log(l + x) is monotone 
decreasing. Taking K~ 1 yfC 2 ~ log(l + ^/C^/IA) > 2 yields the bound 

Fr { E y\Yl £ihn(Xi,Y)\ 2 > dVTTc^ 2™lognj < Kexp(-^IIL^ 



and Lemma 3.6 follows from Lemma 3.4. □ 

Now we complete the proof of Step 7. For n fixed, set 

d(y) '■= ^2 e ih n {Xi,y) and dj := £j^(^j)^{|d|<2™+4 j s y d 2 (y)<Ci2™(iogn) v TT^r}(^j') 

i<2 n 

for 1 < j < 2 n . Then, 



Prj| ^jh n (Xi,Yj) > C2 n logn| = Pr j | g j d ( Y j)\ > C2 n logn| 

i,j'<2™ ' ^ j<2 n ' 

< Pr{3 j < 2 n : d 3 ± diY,)} + Pr{| £ 



> C2 n \ogn 



But, 



:=/„ + //„. 



7 n < Pr<^ max V" £j/j, n (JQ, Yj) 



> 2 



n+4 



i<2 n 

and Lemma 3.3 and Lemma 3.6 show that 



E'« 



< 00. 



(3.17) 
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To estimate II n we can apply Bernstein's or Prokhorov's inequality conditionally on 
the sequence {Xi}. For convenience we will use Lemma 2.8. We can take U = 2 n+4 
and V = C 1 2 2n (logn)^l + c n to get 



Pry 1 1 g A > C2 n logn| 
< K exp 



1 C2 n logn 



K 2 n + 4 



/ C2 2T1+4 logn \\ 



< 



/ C / 2 4 C\ logn \\ 
^exp(-^log(l + ^-) 7 _)j. 



Taking C so that 



C , / 2 4 C 

log ( 1+ CT 



2 4 if 



> 2 



shows, by Lemma 3.4, that 



< oo. 



(3.18) 



(3.17) and (3.18) complete the proof of Step 7. 
Step 8. Inequality (3.10) holds for some C < oo if 



n 
n ' 



c 2 n 2 n 2 2n 
4„ = | : f n (x) < —,f n (y) < ] , 9n(x) < l,g n (y) < l,h 2 (x,y) < — } 

This can be done in the same way as Step 7. 

It is clear that we can write S x S = uf =1 A l n with Aj l ,...,Af l disjoint, and A\ 
satisfying the conditions in Step i for each n. Then, h = X)i=i hi A* = Yl^=i h 
Since for each i the kernels h l n satisfy condition (3.10) for some C < oo, it follows 
by the triangle inequality that the series (3.4) for h converges for some C < oo, 
proving the sufficiency part of Theorem 3.1. □ 

4. Canonical kernels. In this section we show that, for canonical kernels, the LIL 
(1.1) is equivalent to the decoupled and randomized LIL. The preliminary results 
in Section 2(B) yield that the regular LIL implies the decoupled and randomized 
one. The converse implication, however, seems to require Theorem 3.1. The first 
step consists of the following simple inequality, rooted in known symmetrization 
techniques. 

Lemma 4.1. For any kernel h, and for any n £ N and t > 0, we have 
Prj| K x h Y j) > 10t| < 16 Prj | Y e l e 3 h{X i ,Y J ) >t\ 

i,j<n i,j<n 

+ 4Pr|^ y | J2 eMX^Yj^ >t\ 

i,j<n 



i,j<n 
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Proof. Let {Zi} be a sequence of independent random variables such that E\ £\ Zi\ < 
s and let {Z { } be an independent copy of {Zi}. Then, by Chebyshev's inequality, 

Pr (l z 'i\ < 2s ) > V 2 - So > for an y t > °> 



Pr 



> 2t + 2s I < 2Pr 
< 2Pr 



2Pr 



< 2Pr 



4Pr 



53^ t '| < 2s, > 2t + 2si 

i i 

Y,{Zi-Z i )\>2t\ 

i ' 

Y J £i{Zi-Z' i )\>2t\ 

i 

J2 £ iZi >t\ + 2Prj [^eX 



> * 



Using the above inequality conditionally we get 



Prj 1^/1(^,^)1 > 10* | <4Pr||j3e i fe(X i ,y i )| > 4t|+Pr|^ x |^ h(X it Y 3 ) | > £ j 



and 



Prjl^e,^,^)! > 4*| <4Pr||^e i e^(X i ,y i )| > A 

+ PT^E Y \52e i h(X i ,Y j )\ >tj. 



□ 



The next lemma shows that if the second moment and the conditional second 
moment of a canonical kernel h are suitably truncated, then Talagrand's inequality 
(Lemma 2.8) allows control of the last two terms on the right hand side of the 
inequality in Lemma 4.1. 

Lemma 4.2. Let h be a canonical kernel such that 

Eh 2 (X, Y) < c 2 log n and E Y h 2 (X, Y) < c 2 2 n X - a.s. 
for some c < oo. Then we have that, for some universal constant C, 

> cC2 n lognl < n" 2 . 



i,j<2" 

Proof. We can assume c = 1. If we define 



Z:=E Y \ h(Xi,Yj) 
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then 



z = su P { J2 E y(J2 h ( x " Y M Y ))}i 



i<2 n J<2™ 



where the supremum is taken over all g(Y) = g(Yi, . . . , Y2™) with ||y||oo < 1) actually 
over a countable Li-norm determining subset of such functions. Thus Z has the same 
form as in Lemma 2.8. Then, since 



E Y \ Hx, Yj)\ I < I QT E Y h 2 (x, Yjj) 



1/2 



< 2 r 



and 



Y,e(e y \J2 KXi ,Y-)|) < ^ £ £y /i 2 (X,, y,-)) = 2 2n £/i 2 < 2 2 ™ log n, 



i=l j = l 

we can take 



i=l j=l 



U = 2 n and l/ = 2 2n log?i 
in Talagrand's exponential bound for Z. Moreover 

1 /2 

EZ<[E\ H x » Y j)\ 2 ) =2 n (Eh 2 ) 1/2 < 2 n logn. 



(4.1) 



(4.2) 



Now the statement follows by (4.1), (4.2) and the exponential bound in Lemma 2.8. 
□ 

The following lemma will allow us to carry out truncations for canonical kernels 
exactly in the same way as we did for randomized kernels in the first four steps of 
the sufficiency proof of Theorem 3.1. 

Lemma 4.3. For any integrable kernel h, n G N and p > 1 we have 

|| V ^KX^Yj) < 4|| V eiijhiX^Yj) 
ii rrr v ii —i 

i,3<n i,]<n 

Proof. Since 7r 2 /i is canonical, by Jensen's inequality we have that, for all 
E x \ Yl 7T 2 h(X i ,Y j ) P <E X \ faKXi^-nthixlYj)) 



i,j<2™ 



i,j<2 r 



E x \ eifaHX^Yj) - ^hix'^Yj)) 



i,j<2 r 



= E X \ ei{h{X u Y 3 ) - EyhiX^Yj) 

i,j<2™ 
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Thus, by the triangle inequality, 



i,j<2 n P i,J<2™ 

+ || J2 £ i(h(x'i,Yj) - EyHx'^Yj)) 

i,j<2™ 

= 2|| £ i( h (Xi,Yj) ~ EyhiX^Yj)) 

i,j<2™ 

In a similar way we may prove that 

I ZiiHXi,^) - EyhiX^Y^W <2|| eiijhiX^Yj) 

i,j<2 n P i,j<2 n 



Now we can prove the main result of this section. 



□ 



THEOREM 4.4. For any canonical kernel h the following two conditions are equiv- 
alent: 

limsup — — h(Xi,Xj) 

— nlo ^ n \<U< n 



< oo a.s. 



(4.3) 



and 



lim sup 



1 



n— oo nloglognl^ 



< oo a.s. 



(4.4) 



Here, again, each of the two limsups is a.s. bounded by a universal constant 
times the other. 

Proof. (4.3) implies (4.4) (even without degeneracy of the kernel) by Lemma 
2.1(b). 

To prove the opposite implication, by Corollary 2.6 it is enough to show that 
if (4.4) holds (which is equivalent to the two conditions (3.2) and (3.3) by Theorem 
3.1), then 

J2 Ft {\ J2 K x i, Y i) > C2 n logn| < oo. 

n ^ i,j<2 n ' 

Since h is canonical, we may replace h by in this series (h = ^h). As in the 
case of decoupled and randomized kernels, convergence of the series will follow in a 
few steps by showing that 



J2 Pr {\ ^hniX^Y^ > C2 n logn\ < oo, 



(4.5) 



where h n = hlA n for suitably chosen sequences of sets A n . We can assume, as in 
Theorem 3.1, that C = 1 in conditions (3.2) and (3.3). 

Step 1. The series in (4.5) converges for 

A n = {(x,y): f n (x)>2 n (logn) 2 or f n (y) > 2 n (log n) 2 }. 
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By the degeneracy of h we have 

\Eh n \ = |-E/i/{/ n (x)>2"(logn) 2 } + -£W{/„(y)>2"(logn) 2 } 

~ ^ / {/„(x)>2«(logn)2,/„( 2 y)>2"(logn)2}| 
= |^ W {/„(x)>2"(logn)2,/„(y)>2"(logn)2}| 

< Pr{/ n (X) > 2"(logn) 2 } 1/2 Pr{/ n (y) > 2»(log n) 2 } 1/2 

< C2~ n , (4.6) 

where the last two inequalities follow by (3.3) and (3.8) respectively. We also have 

1Tih n (x) = 7ri/li"{/„( y )>2"(logn)2,/„( : E)<2™(logn)2}(^), 

as can be seen using the decomposition of h n given in the first line of (4.6) together 
with the fact that .EVWi/^aj^^iogn) 2 } = 0. Thus, by Chebyshev's inequalty, 



^Prjl^TT^pQ)! >clogn| 



< 



E 



2 n 2 

^1 hI {U (y) >2™ (log n)2 ,/„ (x)<2« (log n) 2 }(X) 



^E 

n 

sE 

n 

^E 



c 2 (logn) 2 
2 n 

c 2 (logn) 2 
2 n 

c 2 (logn) 2 
2 n 

c 2 (logn) 2 



E X (^yW{/„(y)>2»(logn)2}) Z /{/„(X)<2»(logn)2} 

Ex (E Y hI {fn (y)>2 n (log n)2} ) 2 
Pr{/ n (E)>2"(logn) 2 }<oo, 



(4.7) 



where in the last line we used (1.5) with C = 1 (that is, condition (3.3)) and (3.8). 
Finally, as in step 1 of the proof of sufficiency of the symmetrized LIL, 



£> r {| hn(X u Yj)\ >C2 n logn| < oo. 



(4.8) 



i,j<2» 



Inequalities (4.6)-(4.8) imply (4.5) by Hoeffding's decomposition ((2.1)). 
Step 2. The series in (4.5) converges for 

A n c{(x,y):\h(x,y)\>2 n \ogn or f n (x) > T or f n (y) > 2 n ) 

n {(x,y) : max(/ n (x),/ n (y)) < 2™(logn) 2 }. 

To prove this we may proceed just as in steps 2-4 of the proof of the symmetrized 
LIL, with only formal changes: note that in steps 2-4 there we used only Chebyshev's 
inequality to bound probabilities; thus Lemma 4.3 reduces proving inequality (4.5) 
here to steps 2-4 in that proof, where the lower bounds for h and f n are even smaller. 

Step 3. The series in (4.5) converges for 



A n = {(x,y) : \h(x,y)\ < 2 n \ogn,f n (x) < 2 n ,f n (y) < 2 n }. 
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The LIL (4.4) implies that 

i,j<2™ 



> C2 n logn } < oo 



for some C < oo by Lemma 2.2(c). Steps 1-4 from the proof of sufficiency in 
Theorem 3.1 show that 

5>r{| £i£jhI Dn (Xi,Yj)\ >C2 n \ogn\ < oo, 

i,j'<2™ ' 



for any D n C {(x,y) : |/i(x,y)| > 2 n /n or max(/„(i), / n (y)) > 2 n /logn}, in 
particular for D n = A c n . Therefore we have 



J2 Pr {\ Yl e i e j h n {X i ,Y j )\>C2 n \ogn\<oo 



i,j<2" 



(4.9) 



for some C < oo. In order to deduce (4.5) from (4.9) we show first that we can 
replace h n by n^hn in (4.9), and then apply Lemmas 4.1 and 4.2 to 7T2h n . So, we 
begin by proving (4.9) for h n — Tt2h n or, what is the same by Hoeffding's decom- 
position, we prove (4.9) with h n replaced by TX\h n and by Eh n . We can write h n 

as 

h n = h- hl {fn[x)>2 ^ - hl {fn{y)>2n} 

+ ^{/„(x)>2",/„(y)>2™} - ^{|/i|>2Mogn,/„(a:)<2™,/„(y)<2™}- 

Then, by the degeneracy of h and (3.3) we have 

^2 ZitjEhn < 2 2n (\EhI {fn{x)> 2 n j n (y) >2 ™}\ + E\h\I { \ h \ >2 ™ logn }^ 

< 2 2n (Pr{f n (X) > 2 n } 1/2 Pr{f n (Y) > 2 n } 1/2 + E\h\I m>2n logn} ) . 



i,j<2^ 



Now, we note that (3.2) implies E\h\I{\h\>2 n logn} < C2 n (as Pr{|/i| > u} < 
u~ 2 L 2 u) and 

Hence, 

| ^ SiSjEhn < C2 n \ogn. 

i,j<2" 

The above decomposition of h n together with the degeneracy of h also give 



(4.10) 



nthnix) = -7riW{/„(v)>2«,/„(a!)<2»}(a;) -7ri^{|/ l |>2«logn,/„(cc),/ rl (j/)<2™}(^)- 

So, by Chebyshev's inequality and (3.2), we have 



S Pr || £ ^j n l hI {\H^,y)\>^ n ^gn,f n (x),f n (y)<2"}(Xi) 
n ^ i,j<2 n 



> c2 n logn 
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< 



Y ^]^ E | 7r l /l/ {l^(^,y)l>2"logn,/„(x),/„( y )<2»}(^)| 



< - ; — 2 n+i E , i/i|//|/ l | > 2«i ogn } 

clogn 11 1 ' 

n 

I(\h\ > 2 n logn) 

loen 



< CE 



(L 2 lfch2 



< oo. 



(4.11) 



Also, by Chebyshev's inequality, (1.5) with C = 1 and (3.8), 

Y Fr \\ ^' 7r i /l/ {/ ra (y)>2»,/„(-)<2"}(^) >c2 n lognl 

- Yl ~Y] ~ E 7r l /l/ {/„(y)>2",/„(x)<2"}(^) 

C 102, Tli 



< 



E 



1 



—^ 2 -E x (E Y hI(f n (y)>2 n )X 
c z log n \ / 



n 



1 



Pr{/ n (y) > 2"} < oo. (4.12) 

c z log n 

Inequalities (4.9)-(4.12) imply, by the Hoeffding's decomposition, 

J2 Fr {\ Yl £i£^2K{X h Yj) > C2 n logn| < oo (4.13) 

for some C < oo. By (3.2), E(ii2h n ) 2 < Eh 2 ^ < (71ogn, and, by the definition of 
A n and (3.2), E Y (n 2 h n ) 2 (x) < 2E Y h 2 n + 2Eh 2 n < 2 n+1 + Clogn, and likewise for 
Ex(n~2h n ) 2 . Then, it follows from Lemma 4.2 that 



> C2 n logn } < oo 



n ^ i,j<2 n 

for some C < oo, and that, likewise, 

J]Prj£ x | ^h n (X i ,Y j )\>C2 n logn\<oo. 

n ^ i,J<2™ ' 

Then, (4.13)-(4.15) give (4.5) by Lemma 4.1, concluding the proof of Step 3. 
Steps 1-3 together show that 

^Prj| ^K x u Y j) >C2 n lognj < oo, 

n ^ i,j<2 n ' 

concluding the proof of the theorem. □ 



(4.14) 



(4.15) 
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5. Arbitrary kernels. Final comments. We conclude with the proof of Theorem 
1.1, a conjecture on the LIL for kernels of more than two variables, and several 
remarks on the limsup in (1.1) and the limit set of the LIL sequence. 

Proof of Theorem 1.1. Conditions (1.2) and (1.3) are sufhcent for the LIL for 
degenerate kernels by Theorems 3.1 and 4.4. 

If the kernel h satistifies the LIL (1.1), then it satisfies the decoupled and 
randomized LIL by Lemma 2.1(b). Then, by Theorem 3.1, it also satisfies conditions 
(1.2) and (1.3). So, it suffices to prove that if the LIL (1.1) holds then the kernel h 
is canonical. 

Since by (1.2) E\n2h\ p < oo for any p < 2, we have by the Marcinkiewicz type 
strong law of large numbers for £/"-statistics (Gine and Zinn, 1992, theorem 2), 

lim — ^ V 7r 2 /ipQ, Y 7 -) = a.s. for all < p < 2. (5.1) 

n^oo n /P — ' 
i^j<n 

The LIL for h implies the decoupled LIL (2.8) by Lemma 2.2(a), and therefore also 
that 

lim V h(X i ,Y j ) = a.s. for all < p < 2. (5.2) 

i^j<n 

Subtracting (5.1) from (5.2) and using the Hoeffding decomposition we obtain 
lim n 1 - 2/p |^(7n/i(X l ) - ]-Eh) +J2i 7T M Y j) - \ Eh ) = a - s - 

i<n j<n 

However if p > 4/3 this yields, by the CLT or the LIL in R, that 

mh(X) - ]-Eh = a.s. 

Since i\\h is centered, it follows that Eh = and 7iih(X) = a.s. Hence h = W2h is 
canonical for the law of X . □ 

The following conjecture for kernels of more than two variables seems only 
natural. 

Conjecture 5.1. Let h be a kernel of d variables symmetric in its entries. Then 
h satisfies the law of the iterated logarithm 



lim sup 



1 



(nloglogn) ,/ 2 E h(X il ,...,X ii ) <oo a.s. (5.3) 



if and only if the following conditions hold: 

a) h is canonical for the law of X , that is Ex { h(Xi, . . . , Xd) = a.s. 

and there exists C < oo such that 

b) 

E min(/i 2 , u) < Ci^uY' 1 (5.4) 
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for all u > 0, and 

c) 



snp{E[h(X 1 , ,X d ) H MX,)] : Eff(X) < 1, Wf^ < oo, i = 1, . . . , d} < oo. 



(5.5) 



We know at present that the necessity part of this conjecture is true. 

The problem of determining the lim sup in (1.1) when Eh 2 = oo is open and, 
a fortiori, so is the problem of determining the limit set of the LIL sequence. We 
now briefly comment on these questions. The previous results do give the order of 
the limsup in (1.1) up to constants as we show next. In the theorem that follows 
we denote the quantity in (1.3) as ||/j.||l 2M .l 2 - 

Theorem 5.2. Suppose that h(x,y) is canonical for the law of X. Then there 
exists a universal constant C such that, almost surely, 



C~ 



>l 2 + lim sup 
< limsup 



! E(h 2 Au) 



L 2 u 



nL 2 n I 



l<i< j<n 



< c 



\h\\ L2 ^L 2 + limsup 



E{h 2 Au) 



L 2 u 



(5.6) 



The same inequality holds true if h is arbitrary and h(Xi,Xj) is replaced in (5.6) 
by the randomized EiEjh{Xi, Xj), or by the decoupled versions. 

Proof. Lemma 2.1 and the proof of necessity of Theorem 3.1 (see also Corollary 
2.4) give the left hand side bound for decoupled and randomized kernels. The right 
hand side bound, also for decoupled and randomized kernels, follows from the proof 
of sufficiency of Theorem 3.1: let 



K 



max 



|/i|| L2 ^ L2 ,lim sup 



E{h 2 Au) 



L 2 u 



if K = 1, the proof of Theorem 3.1 produces (3.4) for a fixed constant C that could 
be computed if necessary, as can be seen from steps 7 and 8 (the only ones that 
contribute to the limsup), and if K ^ 1, (3.4) with C replaced by CK is obtained 
by considering the kernel h/K. Then, Corollary 2.5 yields the right hand side of 
(5.6). De-randomization as in Section 4 gives the bounds (5.6) for canonical kernels. 
□ 

We know that when Eh 2 < oo and h is a canonical kernel of d variables, the 
limsup in (5.3) is just the quantity in (5.5), and even more, that the limit set of the 
sequence 

(2nloglogn)^ E h(X h ,...,X id ) 
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is a.s. 

d 

{E[h(X u ...,X d )H f(X t )} : Ef(X) < 1} 

i=l 

(Dehling, 1989, for d — 2 and Arcones and Gine, 1995, in general). Then, restricting 
to kernels of two variables, several concrete questions arise: 1) is any of the two 
summands in the bounds (5.6) superfluous?; 2) at least in the case when the kernel h 
defines a compact operator of L2, can we determine the limit set of the LIL sequence 
from the limit set for finite rank h by operator approximation?, and of course, 3) 
what is the limit set in general? We will answer 1) by means of examples showing 
that, in general, both summands in the bound (5.6) are essential, and, regarding 
question 2) we will also determine the limit set for a class of kernels that induce 
compact operators in L^. We wil show, moreover, that there are kernels h that give 
non-compact operators for which the LIL holds (the examples in Gine and Zhang 
(1996) define compact operators and suitable modifications will give non-compact 
ones). Finally, question 3) will remain open but we will show that the limit set is 
always an interval. 

Example 5.3. We consider the kernel 

CO 

h(x,y) = -r L In(x)I n (y), (5.7) 

n=l ° n 

where {I n } is a sequence of functions on R with disjoint supports contained in 
[0,1] such that J R / n (it)(iit = 0, I n (x) G {—1,0,1} for each x G R, the sequence 
{b n } is defined by b n = f R I^(u)du and {a n } is an arbitrary bounded sequence 
of real numbers. Then, if, as will be the case, for X, Y i.i.d. uniform on [0,1], 
E\h(X, Y) \ < oo, h is a canonical kernel for the uniform distribution on [0, 1]. Since 

— 1/2 

{b n I n } is an orthonormal sequence in L<i := L2{C(X)), we have 

\\h\\L 2 ^L 2 = sup \a n \. (5.8) 

If we further assume that {a n /b n } is an incresing sequence, then 

E(h* An) y ELi 4 + f^OXn+i 
hmsup = hmsup r~=\ • 

u^oo L2U n Li^bn ) 

So, if we choose a n = a for all n and I n such that b n = exp [— exp(a 2 n/6)] for large 
n, then 

hmsup — — = b. (5.9) 

u^oo L12U 

Thus, in this case, the kernel h satisfies the LIL by Theorem 3.1. Moreover, (5.8) and 
(5.9) show that the two quantities appearing in the bounds (5.6) are not comparable 
(and, in particular, neither of them is superfluous). In this type of examples, the 
operator in L2 with kernel h is compact if and only if lim n a n = 0, thus showing 
that there are canonical kernels h which satisfy the LIL but that do not define a 
compact operator on L2. 
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If Eh 2 < oo, then the operator norm dominates the bound in (5.6), as the 
limsup of the normalized truncated second moments of h is zero. Even for kernels 
h defining compact operators we may have that it is this second term that domi- 
nates the bound: for a n = 1/y/n and b n = 2~ n , consider the kernels h m (x,y) = 
Y^= m a nb~ 1 I n (x)I n (y); then we have \\h m \\ L2 ^L 2 = 1/Vm -> whereas 
limsup^^ E[h ^ u) = 1 for all m. 

There is, however, a class of canonical kernels h satisfying the LIL and defining 
compact operators for which the limit set of the LIL sequence is the numerical 
range of the operator defined by h, as is the case when h has finite second moment. 
In the next proposition H will denote the operator on L 2 defined by extension of 
the equation Hf(y) = Eh(X,y)f(X), f e L 00 (C(X)) (this operator exists under 
condition (1.3)). 

Proposition 5.4. Let h be a canonical kernel for the law 
a) 

E(h 2 A u) 
lim sup = 

and 

b) the operator H is a compact operator on L2(C(X)). 
Then, the limit set of the sequence 




(5.11) 



is almost surely the closure of the set 

{Eh(X, Y)f(X)f(Y) : Ef(X) < 1, ||/|U < 00}, (5.12) 

that is, the numerical range of the operator H, {E(f(X)Hf(X)) : Ef 2 (X) < 1}. 

Proof. We set, from now on, L2 := L2(C(X)). The proof consists in approximating 
the operator H with kernel h by suitable operators H m with simple kernels, in 
particular, square integable kernels. We begin by showing that there exists an 
increasing sequence Q m of finite sub-cr-algebras of S such that, if P m denotes the 
orthonormal projection onto the subspace of £ m -measurable functions, 

\\P m Hf-Hf\\ L2 ^0, feL 2 . 

Indeed, H being a compact operator, its range is a separable set in L 2 . Therefore 
we can find a sequence {gi} C L 2 of simple functions such that the range of H is 
contained in the closure of the sequence {gi}. Now, it is enough to set 

Qm ■= <7(gi, ■ ■ -,0m) 

to get the desired property. This is so because, obviously, P m 9i 9i for each i e N, 
and the set {/ G L 2 : P m f — > / in L 2 norm} is closed in view of ||P m ||L 2M L 2 — 1- 



of X such that 

(5.10) 
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For each m G N we define 



h ( \ ST Eh(X,Y)I A (X)I B (Y) 

A,B atoms of g m L 1 v 1 

Pr(X£A,yEB)5*0 

where, as usual, Y is an independent copy of X. In other words, h is defined by the 
condition 

h m (X,Y) = E(h{X, Y)\a{X-\g m ),Y-\g m ))). 

The operator H m of L>2 with kernel h m satisfies H m = P m HP m , as is seen from 
its definition. Then, since \\P m Hf — Hf\\L 2 — > for any / G L2, and since H is a 
compact operator in L2, we obtain that 

lim \\H m -H\\ L2 ^ L2 = 0. (5.13) 



n^oo 



To see this, we note that, since (P m — I)H is the adjoint of H(P m — I) and P m has 
norm 1, 

||# m - H\\ L2 ^ L2 = \\P m H(P m -I) + (P m - I)H\\ L2 ^ L2 < 2\\(P m - I)H\\ L2 ^ L2 ; 

now (5.13) follows by a simple compactness argument. 

The result follows from the previous observation together with Theorem 5.2 
applied to h m and to h — /i m , by a standard approximation argument that we now 
sketch. Before we do this, we should note that the closure in L 2 of the set (5.12) 
is the numerical range of H because bounded functions are dense in L2, the unit 
ball of L2 is weakly compact and if f n — > / weakly, with ||/ n ||L 2 — 1 7 then, by 
compactness of H, Hf n — > Hf weakly. Let us write (•, •) for the inner product in 
L2, set 

L:={(HfJ):\\f\\ L2 <l} 
and, for any kernel g(x,y) of two variables, 

If x G L let / G L 2 with ||/|| La < 1 be such that x = (Hf, f). Then, by the LIL 
for kernels with finite second moment, given m G N, for almost every ui there is a 
subsequence such that 

a n Hw) (h m (u)) -> (H m f, /). (5.14) 

Also, since h satisfies (5.10) and h m has finite second moment, Theorem 5.2 gives 

limsup \a n (h m - h)\ < K\\H m - H\\l 2 ^l 2 a.s.. (5.15) 

n 

Moreover, by (5.13), 

(H m g,g)^{Hg,g), g G L 2 . (5.16) 

Combining these three limits we obtain that X IS cl.S. 3j limit point of the sequence 
{a n (h)}. Conversely, suppose now that a: is a limit point of this sequence. Then, 
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by (5.15), given e > 0, for all m large enough and for almost every u there exists a 
subsequence such that 

k-«n fcM (/imM)| < |. 

Therefore, by the LIL for square integrable kernels and (5.16), there is / G L 2 with 
ll/H l 2 < 1 such that 

|X- (if/, /)|<£. 

So, taking e = 1/n, there is a sequence / n in the unit ball of L 2 such that 

x = lim(i?/ n , f n ). 

n 

Since the unit ball of L 2 is weakly compact, the sequence {f n } has a subsequence 
{fn k } that converges weakly to a function / in the unit ball of L 2 . It then follows 
by compactness of H that x = (Hf, /), that is, x G L. □ 

For example the previous proposition applies to the kernels h of Example 5.2 for 
a n = n _1 / 2 £(n) and b n = 2~ n , where £(n) is any slowly varying function tending to 
zero as n — > oo. However, if £(n) = 1 then h still satisfies the LIL (1.1) by Theorem 
1.1 and defines a compact operator in L 2 , but Proposition 5.4 does not apply to it; 
actually, we do not know what the limit set is in this case. 

As mentioned, the problem of determining the a.s. limit set of the sequence 
(5.11) in the general case remains open but we can show that it is an interval. 

Proposition 5.5. Let h be a canonical kernel satisfying conditions (1.2) and (1.3). 
Then, the limit set of the LIL sequence (5.11) is an interval. 

Proof. To prove that the limit set of the sequence (5.11) is an interval, it suffices 
to show that the difference of two consecutive terms of the sequence tends to zero 
a.s. By (1.2) and the law of large numbers for £/"-statistics (or by the LIL), this 
reduces to showing that 

' HXi,X n )^0 a.s. (5.17) 



n log log n 

ta ta l<i<n 



We will first prove 



n log log n 

ta ta l<i<n 



Sih(X it Y n )^0 a.s. (5.18) 



and then will show that Si can be removed and that Y n can be replaced by X n . 
To prove (5.18), it is enough to prove that for all 5 > 

n K - ° l<i<fc ' 

(see e.g. the proof of Corollary 2.4). Let h n = hlA n and h n = h — h n , where 
A n = {{x,y) : \h(x,y)\ < 2 n \ognJ n (y) < 2-(logn) 2 }. 
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Then as in Steps 1 and 2 of the proof of Theorem 3.1 we get 

EPrj max — — ^ 1 V e l h n (X i ,Y k ) > S\ < oo. 
I 2™- 1 <fc<2" 2 n logn I 1 

In order to prove 

VPri max — 1 V Sih n {X h Y k ) > s\ 

^ I 2"-!<fc<2™ 2 n logn I ^ I 

<^2 n Pr|| ^ £^ n (Xi,y)| > 52 n logn| < oo. 

n ^ l<i<2 n ' 

we apply Chebyshev's inequality as in Step 3, reducing the above inequality to 
convergence of the two series 

£ ^ooW^^ 1 ' m (X2 ' y) < °°- 

But these two series converge, just like the first and second series in Step 3. (5.19) 
is thus proved. 

Next we show that we can remove the Rademacher variables from (5.18), that 
is, that (5.18) implies 

— — V h(Xi,Y n )^0 a.s. (5.20) 

nloglogn^ 

Let {Xi} be a copy of {Xi}, independent of {Xi} and {Yi}, and set 

£n:=-r4 h (Xi,Y n ), £„:=-— ^ M^,^n). 

n log log n n log log n 

ta ta l<i<n ta ta l<i<n 

If (5.18) holds, then £ n — £ n — > a.s. by Fubini's theorem and the equidistribution 
of the variables JQ. Hence, (5.20) will follow by a standard argument if — > 
in probability conditionally on the sequence {Yi}. So, assuming (wlog) that the 
variables X and Y are defined on different factors of a product probability space 
O' x O, we must show that 

— h( x i'Y n (uj)) -> in pr., u - a.s., (5.21) 

ttn l<i<n 

where, for ease of notation, we set a n := (nL 2 n)~ 1 . Now, since 

— £ih(Xi, Y n ) — > in pr., cj — a.s. 



On 1 

l<Kn 
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by (5.18), Levy's inequality applied conditionally on {Y{\ gives 

nPr x {\h(X,Y n )\ > a n } -> a.s. (5.22) 

and then, Hoffmann-j0rgensen's inequality applied conditionally after truncation, 
yields 

-^E x h 2 (X, Y n )I { \ h(XiYn)l < an} -> a.s. (5.23) 



Moreover, 

—E x h(X, Y n )I mXtYn) \< an} - a.s. (5.24) 
To prove that this last limit holds, note first that, since E x h = 0, 

E x h(X, Y n )I\ h ( XyYn )\<a n = E x h(X,Y n )I{\ h ( X ,Y n )\>a n }, 

and then that 



l\h{X,Y)\>a n < OO 



because, after exchanging expectation and sum and then summing on n, we see that 
this series is bounded by a constant times E L ^ h ^ , which is finite. Now, (5.22)-(5.24) 
give that, for all s > 0, 

71 l<i<n 

4 77. 

+ 72^2 Exk 1 {\h\<a n } -> a.S., 



proving (5.21), hence, (5.20). 

Finally, to undecouple, assume (5.20) holds. By Theorem 1.1 and the — 1 law 
we know that 

limsup— -4 1 h(Xi,X n ) =C a.s. (5.25) 

n n log log n I ^ 

for some C < oo, and must show that C = 0. Then, we can assume that this limsup 
is attained by the sequence of even terms, that is, 

,. | El<i<2n /l ( X ^ X 2n)| 

limsup =^-j — : — -. — r = 6 a.s. (5.26 

n 2nloglog(2n) v ' 

(otherwise we can take the subsequence of odd terms from (5.25) and continue in 
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the same way as we will now proceed). But 



limsup — ] (o \ \ H X i, X 2n) 

< limsup — — 1 -, J V h(X l ,X 2 n) 
n 2nloglog(2n) I 

i even 

+ limsup— — - — -— I V h(Xi,X 2n ) 
n 2nloglog(2n)l 

i odd 

= limsup — ! TTTt I Yl K X h X n)\ 

n 2nloglog(2n)l I 



n 2nloglog(2n)l i ^ +i 



C 
~2 



by (5.25) and (5.20). This contradicts (5.26) unless C = 0, proving (5.17). □ 
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